AITopics

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Stefan Meintrup, Alexander Munteanu, Dennis Rohde

Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves

Neural Information Processing SystemsFeb-19-2026, 09:47:31 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, chet distance, polygonal curve, (15 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
North America > United States (0.04)
North America > Canada (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Neural Information Processing SystemsFeb-8-2026, 16:37:22 GMT

Revisiting the Evaluation of Image Synthesis with GANs Mengping Y ang 1, Ceyuan Y ang

Unlike most vision tasks that have per-sample ground-truth, image synthesis tasks target generating unseen data and hence are usually evaluated through a distributional distance between one set of real samples and another set of generated samples.

artificial intelligence, extractor, machine learning, (18 more...)

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningOct-10-2025

Optimal Stopping in Latent Diffusion Models

Wu, Yu-Han, Berthet, Quentin, Biau, Gérard, Boyer, Claire, Elie, Romuald, Marion, Pierre

We identify and analyze a surprising phenomenon of Latent Diffusion Models (LDMs) where the final steps of the diffusion can degrade sample quality. In contrast to conventional arguments that justify early stopping for numerical stability, this phenomenon is intrinsic to the dimensionality reduction in LDMs. We provide a principled explanation by analyzing the interaction between latent dimension and stopping time. Under a Gaussian framework with linear autoencoders, we characterize the conditions under which early stopping is needed to minimize the distance between generated and target distributions. More precisely, we show that lower-dimensional representations benefit from earlier termination, whereas higher-dimensional latent spaces require later stopping time. We further establish that the latent dimension interplays with other hyperparameters of the problem such as constraints in the parameters of score matching. Experiments on synthetic and real datasets illustrate these properties, underlining that early stopping can improve generative quality. Together, our results offer a theoretical foundation for understanding how the latent dimension influences the sample quality, and highlight stopping time as a key hyperparameter in LDMs.

covariance matrix, diffusion model, gaussian distribution, (13 more...)

arXiv.org Machine Learning

2510.08409

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Workflow (0.88)
Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Stefan Meintrup, Alexander Munteanu, Dennis Rohde

Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves

Neural Information Processing SystemsOct-1-2025, 23:32:24 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (18 more...)

Country:

North America (0.28)
Europe > Germany (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceFeb-26-2025

A Pragmatic Note on Evaluating Generative Models with Fr\'echet Inception Distance for Retinal Image Synthesis

Wu, Yuli, Liu, Fucheng, Yilmaz, Rüveyda, Konermann, Henning, Walter, Peter, Stegmaier, Johannes

Fr\'echet Inception Distance (FID), computed with an ImageNet pretrained Inception-v3 network, is widely used as a state-of-the-art evaluation metric for generative models. It assumes that feature vectors from Inception-v3 follow a multivariate Gaussian distribution and calculates the 2-Wasserstein distance based on their means and covariances. While FID effectively measures how closely synthetic data match real data in many image synthesis tasks, the primary goal in biomedical generative models is often to enrich training datasets ideally with corresponding annotations. For this purpose, the gold standard for evaluating generative models is to incorporate synthetic data into downstream task training, such as classification and segmentation, to pragmatically assess its performance. In this paper, we examine cases from retinal imaging modalities, including color fundus photography and optical coherence tomography, where FID and its related metrics misalign with task-specific evaluation goals in classification and segmentation. We highlight the limitations of using various metrics, represented by FID and its variants, as evaluation criteria for these applications and address their potential caveats in broader biomedical imaging modalities and downstream tasks.

downstream performance, evaluation metric, generative model, (13 more...)

2502.1716

Country: Europe > Germany (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Luo, Ge Ya, Favero, Gian Mario, Luo, Zhi Hao, Jolicoeur-Martineau, Alexia, Pal, Christopher

Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality

arXiv.org Artificial IntelligenceOct-8-2024

The Fr\'echet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectiveness relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average.

dataset, distortion, feature space, (15 more...)

2410.05203

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Bhatt, Maulik, Zhen, HongHao, Kennedy, Monroe III, Mehr, Negar

Understanding and Imitating Human-Robot Motion with Restricted Visual Fields

arXiv.org Artificial IntelligenceOct-7-2024

When working around humans, it is important to model their perception limitations in order to predict their behavior more accurately. In this work, we consider agents with a limited field of view, viewing range, and ability to miss objects within viewing range (e.g., transparency). By considering the observation model independently from the motion policy, we can better predict the agent's behavior by considering these limitations and approximating them. We perform a user study where human operators navigate a cluttered scene while scanning the region for obstacles with a limited field of view and range. Using imitation learning, we show that a robot can adopt a human's strategy for observing an environment with limitations on observation and navigate with minimal collision with dynamic and static obstacles. We also show that this learned model helps it successfully navigate a physical hardware vehicle in real time.

agent, obstacle, trajectory, (16 more...)

2410.05547

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Fagan, Peter David, Ramamoorthy, Subramanian

Learning from Demonstration with Implicit Nonlinear Dynamics Models

arXiv.org Artificial IntelligenceOct-1-2024

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

artificial intelligence, deep learning, machine learning, (15 more...)

2409.18768

Country:

Europe > United Kingdom (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Chen, Gong, Meghjani, Malika, Prasetyo, Marcel Bartholomeus

Robot Guided Evacuation with Viewpoint Constraints

arXiv.org Artificial IntelligenceSep-28-2024

We present a viewpoint-based non-linear Model Predictive Control (MPC) for evacuation guiding robots. Specifically, the proposed MPC algorithm enables evacuation guiding robots to track and guide cooperative human targets in emergency scenarios. Our algorithm accounts for the environment layout as well as distances between the robot and human target and distance to the goal location. A key challenge for evacuation guiding robot is the trade-off between its planned motion for leading the target toward a goal position and staying in the target's viewpoint while maintaining line-of-sight for guiding. We illustrate the effectiveness of our proposed evacuation guiding algorithm in both simulated and real-world environments with an Unmanned Aerial Vehicle (UAV) guiding a human. Our results suggest that using the contextual information from the environment for motion planning, increases the visibility of the guiding UAV to the human while achieving faster total evacuation time.

agent, algorithm, artificial intelligence, (15 more...)

2409.19466

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.34)